arXiv:1505.01986v3 [cs.IT] 8 Aug 2016 


On Secrecy Capacity of Minimum Storage Regenerating Codes * 
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Abstract. In this paper, we revisit the problem of characterizing the secrecy capacity of minimum 
storage regenerating (MSR) codes under the passive (Zi, ^-eavesdropper model, where the eaves¬ 
dropper has access to data stored on Zi nodes and the repair data for an additional I 2 nodes. We 
study it from the information-theoretic perspective. First, some general properties of MSR codes 
as well as a simple and generally applicable upper bound on secrecy capacity are given. Second, a 
new concept of stable MSR codes is introduced, where the stable property is shown to be closely 
linked with secrecy capacity. Finally, a comprehensive and explicit result on secrecy capacity in the 
linear MSR scenario is present, which generalizes all related works in the literature and also predicts 
certain results for some unexplored linear MSR codes. 
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1 INTRODUCTION 

Distributed storage systems (DSSs) are an essential part of large scale data storage systems required for 
many new emerging distributed networking applications such as social networking, video sharing, peer 
to peer networking and large scale data centres. As is common in such storage systems, redundancy is 
indispensably introduced to ensure reliability and availability owing to frequent node failures. The main 
approaches to introduce redundancy in DSSs are through replication, erasure codes, and more recently 
using regenerating codes [7]. Erasure codes in general can achieve higher reliability for the same level of 
redundancy when compared to the schemes that provide replication |B:. Regenerating codes are a recent 
innovation of erasure codes that has efficient performance on repair of failed nodes in DSSs [8]. 

1.1 Regenerating Codes. 

Regenerating codes [7] are a family of maximal distance separable (MDS) codes determined by a tradeoff 
between the amount of storage per node and the repair bandwidth. In the framework of regenerating 
codes, an encoded data file is split into na symbols and then dispersed across n nodes, where all the 
symbols are drawn from a finite field F g and each node stores a collection of a symbols. The dispersing 
manner requires that any data collector can retrieve the original data message by connecting to any k 
out of n nodes. The node repair can be accomplished by permitting a new node to connect to any d 
helper nodes from the surviving (n — 1) nodes by downloading /3 < a symbols from each node. In the 
literature, a regenerating code is represented by a parameter set {n, k, d, a, /3, B}, where B is the size of 
original data message and dfd is the total amount of data transferred for node repair that is termed repair 
bandwidth. 

The cut-set bound based on the concept of information flow [4] requires that the parameters of a 
regenerating code must necessarily satisfy: 

k 

B < min{q, (d — i + l)/3}. (1) 

i=l 
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In [7] , Dimakis et al derive the above tradeoff between the per node storage a at each node and repair 
bandwidth d/3. The codes that can achieve this tradeoff curve are called optimal regenerating codes. Two 
extreme points on this tradeoff curve are of particular concern, namely, minimum bandwidth regenerating 
(MBR) point and minimum storage regenerating (MSR) point, respectively representing codes with the 
least repair bandwidth and ones with the least per node storage. As shown in [7], the parameters of MBR 
and MSR codes are given by: 
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In the literature, three repair models are considered: functional repair, exact repair, and exact repair of 
systematic nodes [8|. Exact repair can regenerate the exact replicas of the lost data in the failed nodes 
and thus is preferred in practical systems [5]. In the exact repair scenario, Shah et al in [S] demonstrate 
that most interior points on the storage-bandwidth tradeoff curve are not achievable. For those possibly 
reachable interior points, constructions of codes are rare [I2ll3j . In addition, Duursma in |10lllj derive 
some new outer bounds for regenerating codes with exact repair. 

Up to now, several constructions with the exact repair property for MBR and MSR codes have 
been proposed. In |13j . Rashmi et al employ product matrix to construct MBR codes for all parameters 
and MSR codes with {d > 2k — 2}. In the MSR scenario, significant progress have been made. From 
the overall perspective, there are two classes of MSR codes, i.e., the scalar MSR codes with {(3 = 1} 
[15I16I17I18I19I20] and vector MSR codes with {/3 = (n — k) x } where x > 1 [21I22I23I24I25I26I27I28I29] . 
Many of these constructions are established on interference alignment. As explained in j20], interference 
alignment is the necessity of constructing linear scalar MSR codes and these linear scalar MSR codes 
only exist when d > 2k — 2. From another point of view, this existing restriction exactly corresponds to 
the low rate regime, i.e., ^ ^ As for the high rate codes with > i}, vector MSR codes are 

available as they are free from the parameter constraints of (n, k). However, many of these vector codes 
only allow efficient repair of systematic nodes [23124I25II26I27I28I29] , such as Zigzag codes [23]. In [21I22J , 
the authors present vector MSR codes allowing efficient repair for parity nodes as well, where the code 
given in m is a variant of Zigzag code. 

In addition to repair efficiency, there are many other design features required by DSSs such as security 
[30131132133134135) . local-repairability [33138139140] . optimality of updating [23128129] . etc. Our concern in 
this paper is on securing DSSs against eavesdroppers attempting to obtain any knowledge of the original 
data. 


1.2 Secure Regenerating Codes. 

Since the nodes of DSSs are widely spread across the network, individual nodes may be compromised and 
as a result the data stored is vulnerable to eavesdropping. There are mainly two kinds of attacker models 
considered in the literature: passive eavesdropper model and active eavesdropper model [3]. Compared 
to the former, active eavesdropper can modify the data or even inject new data into the compromised 
nodes. Our eavesdropper model considered in this paper is the passive one as given in m • In this model, 
eavesdropper has access to the data stored on l\ nodes as well as the repair data for an additional Z 2 
nodes. Here, we only consider the situation of exact repaiiQ. 

Related work: The issue of designing secure regenerating codes against eavesdropping was firstly 
addressed in 3(1 and [3T . The authors in ,33] considered the initial setting that an eavesdropper observes 
the contents of l < k nodes of the storage system, and analyzed the regenerating code’s secrecy capacity 

1 Functional repair scheme requires ceaselessly updating the data stored in nodes undergoing repair, which 
may leak substantial linear combinations of data to eavesdroppers and enable the eavesdroppers to retrieve 
the original data just by solving the linear equations. This is another reason why exact repair is superior to 
functional repair. 
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(i.e., the maximal file size that can be securely stored). An upper bound of the secrecy capacity and a 
secure MBR code that can attain this bound are proposed in [30]. Extending the initial eavesdropper 
setting [30], authors in [31] modeled the eavesdropper as one obtaining access to the data stored on l\ 
nodes as well as the repair data for an additional I 2 nodes, with l\ + 12 < k. The secure product-matrix- 
based MBR coding scheme proposed in [31] can achieve the bound derived in [30] with l = h + h- 
Achievability of the bound for secure product-matrix-based MBR codes in [5T] can be attributed to the 
fact that the repair bandwidth d/3 equals to per node storage a in the MBR scenario. In other words, the 
(l 1 , ^-eavesdropper cannot obtain any extra information other than the contents of l = l\ + I 2 nodes in 
the MBR scenario. Hence, under the (Zi, / 2 )-eavesdropper model, the upper bound in [3U] still holds for 
the secure hie size B^ of MBR codes: 


k 

^ min{a, (d — i + l)/3}, 

i-l+l 


(3) 


where l = h + h- Authors in [31] further considered the design of secure MSR codes based on product- 
matrix codes, but the secure MSR coding scheme is only capable of storing (k — li — h)(oi — ^2/3)-sized 
secure hies, which reaches the bound ([3]) only when I 2 = 0. The intuition here indicates that the (Zi, I 2 )- 
eavesdropper can obtain more information than the contents of (l 1 + I 2 ) nodes in the MSR scenario, as 
the repair bandwidth d/3 is larger than a = (d — k + l)f3 that is the amount of data stored on each of 
those I 2 nodes. As mentioned in E5, it was unknown yet whether such a secure MSR code is still optimal 
when I 2 > 1. Since then, characterization of the secrecy capacity for MSR codes is considered to be open 
under (Zi, Z 2 > 0)-eavesdropper model. 

Recently, the authors in |32j and |33j employ the technique of linear subspace intersection and then 
derive new upper bounds on secrecy capacity for linear MSR codes. Zigzag code [23] and its variant [21] 
are shown to achieve these bounds through pre-coding of maximum rank distance (MRD) code [36137] . 
The bound given in [33] auxiliarily implies that the product-matrix-based secure MSR code proposed in 
m is also optimal for I 2 = 1. Regarding the bound given in [32] . it is actually an extension of the one in 
[33] . since the bound in [321 matches to that in [33] when I 2 < 2. 

In another parallel research area, towards two separate eavesdropper models with (Z 1 ,12 = 0) and ( l\ = 
CM 2 ), the authors in [53] study the secure storage-vs-repair-bandwidth tradeoff, where they respectively 
derive new outer bounds on secrecy capacity for a general parameter set and some specific parameter sets. 
Therein, they show that in the presence of {l\ = 0, ^-eavesdropper, these new bounds strictly improve 
upon the existing cutset-based bounds presented in m and the MBR point is the only efficient point that 
can achieve these specific-parameter-based bounds. Under the above background of (Zi, ^-eavesdropper 
model, our focus herein is dedicated to studying the secrecy capacity solely at the MSR point@. 

Contributions: In this work, we first carefully review the method of determining regenerating codes 
considered in [7\ and the information-theoretic technique used in [9j- Therein, we find that the a symbols 
stored in any node or the (3 symbols contained in any single set of repair data for the optimal regenerating 
codes are in fact mutually independent and uniformly distributed inside themselves. It not only indicates 
that entropy of any symbol involved reaches the maximal value 1, but also signifies that entropy of the 
a symbols in any node and entropy of the f3 symbols in any single repair data all attain the maximal- 
integer-value a and /3 respectively. Thereafter, we recognize that the concepts of uniform distribution 
and independence between symbols in information theory pQ exactly correspond to those of permutation 
polynomial and orthogonal system in finite fields [2] respectively. Using these two theories in finite fields, 

2 It is shown in [34] that for certain parameters, secure codes operating at the MBR points actually have better 
“storage” (i.e., the maximal file size that can be securely stored, or just termed secrecy capacity) rate than 
codes operating at the MSR points. In this sense, it appears that secure MSR codes lose the feature of optimal 
storage, while the original notion of MSR codes under the non-secure setting shall be optimal in storage rate 
as displayed in [7] . Throughout this paper, we still use the term MSR points (or codes) to only signify the fact 
that a and /3 satisfy the relationship a = (d — k + l)/3 like the MBR points termed in [34] that require a = d/3. 
Essentially, each node in the secure MSR codes still stores «msr symbols and transmits /3 msr symbols for 
repairing failed nodes, which just need to replace with some randomness. 
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we demonstrate that the joint entropy of symbols included in multiple sets of repair data in the nonlinear 
context may be a non-integer value while it must be an integer in the linear context, which will be used 
to investigate the secrecy capacity of linear MSR codes. 

Then, we turn to study the inherent features of MSR codes from the information-theoretic perspective, 
where the data stored in storage nodes and transferred by helper nodes during repair are considered as 
random variables. Based on the basic reconstruction and regeneration properties of MSR codes with 
{n = d + 1, k, d , a, /3}, we derive two useful properties: (i) the repair data sent from disjoint sets of nodes 
to a failed node are mutually independent, and (ii) given the contents of a node and the repair data from 
any k — 1 nodes, the repair data from the remaining d — k + 1 nodes are deterministic. Combining the 
two new properties with a universal upper bound on secrecy capacity for any optimal regenerating code 
with {n = d + 1, k, d , a, /?}, we derive a simple and generally applicable upper bound on secrecy capacity 
for any MSR code with {n = d + 1, k, d, a, /3}. As for the MSR codes with {n > d + 1, k, d, a, /3}, we 
introduce a new concept of “ stable ” MSR codes, which require that repair data transmitted from any 
node i to any failed node j is independent of the choice of the set of helper nodes including the same 
node i. Therein, we show this stable property is the equivalent condition of secrecy capacity between any 
MSR code with {n > d + 1 ,k,d,a,f3} and its truncated one with {n = d + l,k,d, a, /3}. It should be 
noted that the product-matrix-based MSR code given in m is a stable MSR code. 

Finally, we converge back to the linear MSR codes with parameter set {n = d + 1, k, d , a, /3}, where 
those aforementioned upper bounds on secrecy capacity actually can always be achieved through the 
pre-coding of maximum rank distance (MRD) code l.‘>fil.‘>7l as applied in |33|35] . Based on the fact that 
joint entropy of multiple sets of repair data is an integer, we fully characterize the secrecy capacity of 
linear MSR codes in the category where 1 < /3 < ij ^ 1 ■ A consequence of this result when /3 = 1 
naturally establishes the optimality of product-matrix-based secure MSR codes whenever h +h < k — 1 
and I 2 < rninjfc — 1, d — k + 1 }, which completely resolves the question raised in [3Tj . Note that product- 
matrix-based MSR code given in m is a scalar MSR code, i.e., it is built on /I = 1. In the other 
category where /3 > d ^ l k j 1 1 • we give new upper bounds on secrecy capacity, which are in fact improved 
generalization of the results given in 1321331 . Thereafter, we find that all the aforementioned results also 
apply to systematic MSR codes with only repair data of systematic nodes eavesdropped. By putting all 
together, we eventually present a comprehensive and explicit result on secrecy capacity for linear MSR 
codes, which closely depends on the value of ft. This final outcome also predicts certain results on secrecy 
capacity for some unexplored linear MSR codes. As an illustration and comparison, Table |T] summaries 
the study progresses on secrecy capacity for linear MSR codes, wherein it should be noted that the bound 
in [53] cannot be reached for MSR codes. 


Table 1 . Secrecy Capacity of Linear MSR Codes under (Zi, fej-Eavesdropper Model 


Citation 

Corresponding Results 

Pawar et al)30| 

< (k — li — h)a, optimal only when I 2 = 0 

Tandon et al [34] 

< (k — Z 2 )(1 — for n = d + 1, Zi = 0 and 1 < I 2 < k 

Shah et al [31] 

B (s ) = (k — h — h)(a — h/3), for product-matrix-based MSR codes 

Rawat et al[33l 

[ /3, for h = 1 

fl(') <(k-li-l2)(a-0((3,l 2 )), where 9(fi,h) = { na P . , n 

d+1 r f° r l * = 2 

Goparaju et al[32) 

B (s> < (k — 1 1 — Z 2 )( 1 — ^)‘ 2 q , where n = d + 1 

This paper 

= (k — h — h)(a — 7t(/5, Z 2 )), wherein 

, 0 , x r=«2 0 , ' if 

2> ■ \ >tp + 0{d-k-t + l)[l- , if l2=t + e,^±l<p<^^, 

where 1 < t < d — k + 1 and e > 1. This also can be referenced from our formula (1711 
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Organization: Section 2 gives preliminaries consisting of some basic definitions in information theory, 
notation used in this paper, some results from theory of finite field and a universal upper bound under the 
(Zi, fe)-eavesdropper model. Section 3 presents some new results for general MSR codes mainly including 
some general properties, some generally applicable upper bounds on secrecy capacity and the new concept 
of stable MSR codes. Section 4 exhibits the comprehensive and explicit result on secrecy capacity for linear 
MSR codes. Section 5 concludes this paper. 


2 PRELIMINARIES 

In this section, some basic concepts related to information theory are quoted, which will be used in 
high frequency later. Then, we describe the system model of MSR codes from the information-theoretic 
perspective. Subsequently, we introduce the theory on permutation polynomial in finite fields, which can 
be regarded as a new way to understand the construction of optimal regenerating codes. At last, we 
present a universal upper bound on secrecy capacity under the (Zi, ^-eavesdropper model. 


2.1 Information Entropy 

Definition 1. fff(Entropy of A Random Variable X): The entropy of a discrete random variable X with 
probability distribution px (%) is 

H(X) =-^p(x)logp(x). (4) 

X 

The entropy measures the expected uncertainty in X. It must be that H(X) > 0, meaning entropy is 
always non-negative and H(X) = 0 iff X is deterministic. In addition, when X is uniformly distributed 
(i.e., p{x) = i where q is the total number of the events of X), H(X) achieves the maximum value 
logg. Normally, the base of logarithm can be specified to q. In this case, it must be that H(X) < 1 and 
H(X) = 1 iff X is uniformly distributed. 

Definition 2. Iff (Joint Entropy and Conditional Entropy): Joint entropy between two random variables 
X and Y, and conditional entropy of Y given a random variable X are respectively 


H(X, Y) = -E p{Xty) [\ogp(X,Y)} = - EE p{x,y) log p(x,y) 

x y 

H(Y\X) = -E p{Xty) [logp(Y\X)} = - Y,p( x ) h (Y\ x = x) 


Besides, joint and conditional entropy provide a natural calculus: H(X,Y ) = H(X) + H(Y\X). 

Definition 3. Iff (Mutual Information and Conditional Mutual Information): The mutual information 
between X and Y, and the conditional mutual information between X and Y given another random 
variable Z are respectively given by: 

(l(X-Y)=H(X)-H(X\Y) 

\I(X-Y\Z) = H(X\Z) - H(X\Y,Z) U 

Definition 4. Iff (Chain Rules): Chain rules for entropy and mutual information are: 


n 

H(Xi, • • • , X n ) = H( Xi\Xi- U ■■■ ,Xl) 

i= 1 

n 

/(*!,••■ ,X n -Y)=Y / I (X i -Y\X i _ 1 ,X i _ 2 ,--- ,X{) 

i= 1 


(7) 


Lemma 1. Based on these definitions of information entropy, we naturally have 


I(X;Y\Z) = I(Y;X\Z) < min {H{X),H(Y)}. 


( 8 ) 
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2.2 Notation 


We follow the information-theoretic approach introduced in [5, and accordingly treat all data symbols 
including data stored at the storage nodes and those transferred by helper nodes during the repair 
operations as random variables. 

Note 1 Throughout the paper, we mainly consider the situation of MSR code with parameter set {n = d+ 
1 ,k,d, a, (3}, because any upper bound on the data file that can be securely stored for any secure MSR code 
with {n = d + 1, k, d , a , (3} also holds for the corresponding secure MSR code with {n > d + 1, k, d, a , {3}. 
In Section 3.3, we will establish the equivalent condition of secrecy capacity between any MSR code with 
{n > d + 1, k , d , a , /?} and its truncated one with {n = d + 1, k , d , a , / 3 }. 

We represent nodes using indices 1 to n and denote the sequence of nodes [*,*-+- 1, - - • , j] by [i,j\, 
where i < j. We use symbols for a set {...} and a sequence [...] interchangeably. For any regenerating 
code with parameter set {n = d + 1, k, d, a, /3}, we let 

■ (1). Wi , i £ [1, d + 1] denote the random variable corresponding to the content of node i. As proved 
in [5], it must be that H{Wf) = a for any optimal regenerating code including MSR codes. 

■ (2). {Wa,A C [l,d + 1]} denote the set of random variables corresponding to the nodes in the 
subset A. Throughout the paper, subscripts of W can represent either a node index or a set of nodes 
which will be clear from the context. 

■ (3). Sfi {i,j} £ [1, d+ 1], * ^ j denote the random variable corresponding to the data symbols sent 
by the helper node i to the replacement of the failed node j. It must be that H(S^) = (3 for any optimal 
regenerating code including MSR codes, following from [9 . 

■ (4). S B denote the set {Sj\i £ A,j £ B,i j,AC [l,d + 1 \,B C [l,d + 1]}, and particularly S B 
substitutes for S B d+1 y 

According to the above notation, reconstruction as well as regeneration property of any regenerating 
code can be expressed as 

(H(W il ,W i2 ,---,W ik ) = ka, ij £ [1, d + 1], j £ {1 ,... ,k} 

| = 0, ie[l,d + l] 


2.3 Permutation Polynomials 


As shown in [7], a represents the number of symbols stored in each node and j3 corresponds to the 
number of symbols downloaded from a surviving node to repair a failed node. Note that the entropy of 
each symbol cannot be greater than 1 and may not be an integer. Thus, it can only be that H(Wi) < a 
and H(Sl) < (3. Subsequently, under the context of optimal regenerating codes, Shah et al in (9) employ 
information theory to derive that H(Wi) = a and H(S() = (3, which implies that each symbol contained 
in each node and repair data actually reaches the maximum entropy 1, i.e., each symbol is uniformly 
distributed inside itself. Besides, it also means that the symbols included in the same node i and same 
repair data S% are mutually independent respectively. Although each symbol included in any repair data 
S'l has the uniform distribution and Sj also has the maximal entropy /3, the joint entropy H(Sfi,Sf 2 ) 
may not be an integer where j\ jz as illustrated in the following. 

We let {y\,y 2 , ■ ■ • ,y l a ) denote the a symbols stored in node i, where H(y\) = 1 for any l £ [l,a\. In 
addition, we let and ■ ■ ■ ,Zp’^) be the f3 symbols contained in 

the repair data Sfi and Sj 2 respectively, where H(zi^) = H{z\ 1 ^ 2 ' > ) = 1 for l £ [l,/3]. Now consider the 
joint entropy 


H(Si 1 ,Si 2 ) = H(z[ i ’ h \zi i ' jl) , 


(*bl) (*,.72) 

8 ’ Z 1 i 


y(i<h) 


)■ 


( 10 ) 


In a finite field F q , any mapping r : —> F 9 can be represented by a polynomial over F g of degree 

< q in each “indeterminate” through Lagrange Interpolation [2], Since all the symbols contained in 
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node i are uniformly distributed inside themselves and mutually independent, they can be regarded as 
“indeterminates”. So, we let 


and 


' £’ h) = h(y[,yl--- ,vi) 
4 hn) = h{y\,y l 2,--- ,y l a ) 

< 

. zf n) = fp{y\,y l 2>y l a ) 

'z[ i ’ j2) = f 1 (y\,yl---,y i a ) 

zi hJ2) = Mvhvh--- ,Va) 

„4 M2) = Mvbvh--- 


(ii) 


( 12 ) 


where (/i, / 2 , • • • , fp) and (/j, / 2 , • ■ ■ , /a) represent the polynomials induced by the symbols contained 
in repair data S 31 and S 32 respectively. In [2], there are two special concepts introduced as follows. 


Definition 5. J2!) (Permutation Polynomial): A polynomial f £ F g [xi,--- ,x n ] is called a permutation 
polynomial in n indeterminates over F 9 if the equation f(x i, , x n ) = a has q n_1 solutions in F” for 
each a £ F 9 . 


According to Definition 0 we know that each value a £ ¥ q will be taken in the same probability 
{ q q n = 1) by a permutation polynomial. From this point, permutation polynomial exactly corresponds 
to uniform distribution in information theory (Definition [T]). Due to that H{z\ l,3x ') = H(z[ l ' 32> ) = 1 
for any l £ [1, /?], we know that (/i, fi-, ■ ■ ■ , f /3 , fi, f 2 , • • ■ , fp) all are permutation polynomials. Here, it 
should be noted that permutation polynomials are not necessarily linear polynomials in finite fields while 
linear polynomials apparently are permutation polynomials. 


Definition 6. ^(Orthogonal System): A system of polynomials fi,-- - ,f m £ F g [xi,-- - ,x n ] where 1 < 
m < n is said to be orthogonal in F 9 , if the system of equations 


{ /i(xi, • • • ,x n ) = ai 

i (13) 

fm(x 1 , * * * , Xn) — 

has q n ~ m solutions in F^ for each (a i, • • • , a m ) £ F™. 

According to Definition [G] and Definition [2 we know that (/i, / 2 , - * * , f@) and (/i,/2,-“ ,fp) re¬ 
spectively constitute two orthogonal systems, since H(S) 1 ) = H(S( 2 ) = (3. Similarly, it follows that 
H{S 3 i 1 ,S( 2 ) = 2/3 if and only if the 2/3 polynomials {fi, f2,- " > fpi /l) h, ■ ■ • , fp) can form an orthogonal 
system. 

However, if there exist two different polynomials f^ and fi 2 for some l\, I 2 £ [l,/3] that cannot form an 
orthogonal system, the joint entropy of the corresponding symbols H(z[ l ( 3l \ z\(f 32 ^) will be a non-integer, 

which may result in that all the symbols of repair data Sj' n ’ J2 ^ also have the non-integer joint entropy. 
Note that multiple permutation polynomials may not form an orthogonal system while each polynomial 
in an orthogonal system must be a permutation polynomial. 

Nevertheless, in the linear context, the joint entropy of the symbols contained in Sf must be an 
integer, where i £ A and A is any subset of [I, d + 1]. 
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Lemma 2 . In the scenario of linear optimal regenerating codes, the symbols contained in Sf must have 
integer-value entropy, where i € [1, d + 1], A C [1, d + 1] and i A. 

Proof. Assume all the m = \A\- fi symbols in Sf can be represented as 


{/l(xi,x 2 , • • • ,x a ),f 2 (x ,x a ),--- ,fm(x 1 ,X 2 ,--- ,X a )}, (14) 

where (x\,x 2 , - ■ ■ ,x a ) are the a symbols stored in node i and fi denotes the linear polynomial for 
l G [l,m]. Then, we let 


' fi(x!,x 2 , ■■■ , x a ) = anxi + ai 2 x 2 H-b a la x a 

f2^x i, x 2 , * • • , x a ) a 2 \X\ T a 22 x 2 T * * * T a 2 a x a 

. (15) 


. fm(x i, x 2 , , Xa) — a m iXi T a rn2 x 2 A * * * T a ma x a , 

where all the coefficients are drawn from F g . Equation (fl5l) can be alternatively expressed as 

(/l, / 2 , - * * ,fm) T = C ■ (X 1 ,X 2 ,--- ,X a ) T , (16) 

where T indicates the transpose operation and C denotes the generator matrix. 

Since (/i,/ 2 ,--- ,/ m ) are linear combinations of a set of uniformly distributed random variables, 
then they all are uniformly distributed and they are either mutually independent, or some of them are 
determined by the remaining of them. In fact, the value of H(S^) is equal to the rank of C, which we 
denote by r(C). 

1. When m < a and r(C) = m, the row vectors of C are linearly independent. Then, for each 
vector value (61,62,-, 6 m ) G F™, equation (fl 5 l) has q a ~ m solutions in F“. Thus, each vector value 

( 61 , 6 2 , - - • , 6 m ) will occur in equally probability q qa = So, the polynomials dl5l) form an orthogonal 
system. In this case, we can calculate that H(Sf) = m = r(C) according to Definition [2] 

2. When r(C) < m, the row vectors of C are not linearly independent, which implies r(C) chosen 
linearly independent polynomials fi will determine the values of the remaining m — r(C ) polynomials. 
Although the whole polynomials (fT 5 l) cannot form the orthogonal system, the r(C) linearly independent 
polynomials still forms an orthogonal system. Similar to the above case, entropy of these r(C) linearly 
independent polynomials is equal to r(C). Thereby, we have 


H(Sf)=H(f 1 J 2 ,--. ,f m ) = H(f h , , fi r (c)) = r{C), (17) 

where {/q, fi 2 , - ■ ■ , fi r{c) } are the r{C) chosen linearly independent polynomials. 

Hence, both cases indicate that H(Sf) = r(C), while r(C) must be an integer since it represents the 
rank of C. 

Remark 1 In this lemma, theories of permutation polynomial and orthogonal system are used to demon¬ 
strate that the joint entropy of symbols included in multiple sets of repair data in the nonlinear context 
may be a non-integer value while their joint entropy has to be an integer in the linear context, which is 
important for the later discussion on secrecy capacity of linear MSR codes. 

Additionally, it is of independent interest that these two theories in finite fields are also applicable to 
the nonlinear context, because they may be utilized to explore the case of constructing nonlinear optimal 
regenerating codes. That is beyond the scope of this paper though. In this paper, we mainly study the 
secrecy capacity of linear MSR codes, while some new insights on general MSR codes are also present. 





2.4 A Universal Upper Bound 


► Eavesdropper Model: Let £ be a set of l\ nodes which the eavesdropper has access to, and F be 
another disjoint set of I 2 nodes whose repair data can be observed by the eavesdropper. In other words, 
the eavesdropper is assumed to have the knowledge of {We, S F }. Furthermore, we assume h +12 < k, 
otherwise the eavesdropper can retrieve all the data message. Due to this eavesdropper model, we set G 
to be another subset G C {[1, d + 1] \ (E U F)} of size (k — l\ — Z 2 ). Based on this model, a universal 
upper bound on the secrecy capacity of any optimal regenerating code is given as follows. 

Lemma 3. For any secure optimal regenerating code with {n = d + 1 ,k,d,a, /3}, we have 

' £ (s) 

< H(We,W f ,Wg\W e ,S f ) 

< = H(W G \W E ,W F )~H(S F \W E ,W F ) (18) 

k 

= min{a, (d — i + l)/3} — H(S f \We, W f ) 

S. 2=Zi+/2 + l 

Proof. First, in secure regenerating codes I3ll32l33l . the random variables associated with the message 
can be viewed as the tuple ( D,R ), where D corresponds to the actual data file and R corresponds to 
the randomness added. The secure file size is B^ = H(D) and the secrecy condition requires that 
I(D ; We, S f ) = 0. Thus, it must be that 

' H(D) 

= H(D)-I(D-W e ,S f ) 

< =H(D\W e ,S f ) (19) 

< H{D,R\W e ,S f ) 
k =H(We,W f ,W g \W e ,S f ), 

where the equation in the last step follows from the reconstruction property. 

Second, we have 


' H(W g \W e , W f ) - H(W e , W f , W g \W Ei S f ) 

= H(W g \W e , W f ) - H{W e , W F , W G \W E , W F , S F ) 

= H(W g \We,W f )-H(W g \We,W f ,S f ) 

„ ( 20 ) 

= I(W g ;S f \We,W f ) 

= H{S f \W e , W f ) - H(S f \W e , W f , W g ) 

, =H(S f \W e ,W f ), 

where the regeneration property leads to H(S F ) = H(S F , W F ) that is used in the first step. 

At last, for the optimal regenerating codes, it follows from the proof of property 1 in [9] that 

k 

H(Wg\We,W f )= ^ min{a, (d — i + 1)/?}. (21) 

i=l\ +Z 2 + I 

Remark 2 In the context of linear regenerating codes, MRD (Maximum Rank Distance) codes (e.g. 
Gabidulin code WS) can be used to pre-code the original data of size {B = ka}, which is required to 
consist of {B — H (We, S F )}-sized actual data file D and H(We, S F )-sized random data R. It should be 
noted that H(We,S f ) is also an integer as derived in Lemma [ H because {We,S f } are obtained by the 
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linear combinations of the original data message of size B. As shown in \33\35y . this kind of secure code 
construction always can meet the secrecy condition that I(D',We, S f ) = 0 .It exactly means the maximal 
file size that can be securely stored is 

B (,) = B - H(W e ,S f ) = H(W e ,W f ,W g \We,S f ). (22) 

In the MSR scenario, it is obvious that H(Wg\We,Wf) = H(Wg ) = (k — li — I 2 )a, following from 
property 2 in W- Thus, we only need to concentrate on the term H(S f \We,Wf) in this paper. 

3 DATA SECRECY FOR GENERAL MSR CODES 

In this section, we give some general properties of MSR codes and a simple expression of upper bound on 
secrecy capacity, which will be leveraged throughout this paper. Afterwards, stable MSR code as a new 
concept is introduced, where the stable property will be shown to be closely linked with secrecy capacity. 


3.1 Properties of General MSR Codes 

Here, we proceed to provide some new properties of general MSR codes (including the nonlinear context), 
which actually stem from the reconstruction and regeneration properties of MSR codes. With these 
properties, we can further simplify the formulation H(S f \We,We) mentioned above. 

Lemma 4. In the scenario of MSR codes with parameter set {n = d + 1, k, d, a , /3} , for any node i with 
efficient repair, consider two arbitrary subsets A' and B' such that {|A'| = k — 1, \B'\ = d — k+\,A'C\B' = 
0, A! U B' = [1, d + 1] \ *}, it must be that 

H{Sa'ub') = dP /9o\ 

H(S i B ,\W i ,S i A ,) = 0. 1 J 

Proof. Without loss of generality, we assume i = 1. The proof is given in two steps as follows. 

1. According to the Property 2 in [5], it is trivial that I(Wi; Wa') = 0 in the MSR scenario, which 
leads to H(W\\S\,) = a. 


2. We set B' = ( 61 , 62 , • • • , bd-k+i). Due to the repair property, it must be that H{W\\S\,, Sg,) = 0. 
Next, some key inequalities are present from Lennna[l] 

' H(W\\S\,) — H(Wi\S\,, S^) 

= I(W 1 -,Sl 1 \S 1 A ,) 

= H(Sl 1 \S 1 A ,)-H(Sl 1 \W 1 ,S\.) 

< P'i 

HfW^^Sl) - H(Wi\S\,, Sl x , Sl 2 ) 

= I{Wr,Sl\S\,,Sl) 

< =H{Sl\S\,,Sl)-H(Sl\W u S\„Sl) (24) 

< P ‘, 


H(Wi\S \,, Si Sl d _ h ) - H{W 1 \S \,, S^) 

=nwv,st d _ k+i \s\,,si B , Xbd _ k+i} ) 

= H(Sl d _ k+1 \S\,,S\ BISibd _ k+l} ) - *W_ fc+l} ) 

, <ffi 
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By summing up the left side of the inequalities, we derive 

a = H(W 1 \S 1 A ,)-H(W 1 \S 1 A ,,S 1 B ,) < (d-k + l)/3. (25) 

Because a = (d — k + l)/3, it is mandatory that all the inequalities (1M1) actually are equations. Thus, for 
any j £ [1, d — k + 1], we have 

jH(Sl 3 \S\„Sl bu .^ bj _ i} ) = l3 


from which we can derive 


and 


H{S\, UB ,) 

= H{S\,) + H{S 1 b ,\S\,) 

j=d-k -\-1 

= (h-l)/3 + 

3=1 

= (k — l)/3 + (d — k + l)/3 
_ = d/3 


HiSUW^S 1 ^) 

j=d-k -\-1 

= E H ( s i\ w i’ sl A”Si bl ,..., bj _ l} ) 

3=1 

= 0 


-l}) 


(26) 


(27) 


(28) 


Remark 3 This lemma exhibits the special properties of MSR codes. H{S\, GB ,) = d/3 means any repair 
data from disjoint sets of nodes upon failure of node i are mutually independent. H(S B ,\Wi, S l A ,) = 0 
implies that given the contents of node i and the repair data from any k — 1 nodes, the repair data from 
the remaining d — k + 1 nodes are deterministic. 


(30) 


Lemma 5. In the MSR scenario with {n = d + 1, k , d , a, /3}, we have 

H{S f \We,W f ) = H{S%), ( 29 ) 

where E, F and G are pairwise disjoint sets as defined in Section 2-4 and l-EU-FUGI = k. Furthermore, 
when E = 0, we still have H(S F \W F ) = H(S G ), where |FUG| = k. 

Proof. Assume all the d+1 nodes are comprised of E , F , G and T, where l-EUFUGI = k and |T| = d—k+l. 
Thereby, we have 

H(S F \W {E}F} ) 

— H{S f ef , GT }\W{ E jF }) 

= H(S f G ' T} \W { e>f} ) 

= H(S f \W { e , f} ) + H(S f \W {e , f} ,S f ). 

Then, due to the condition \(E, F,G)\i\ = k — 1, Lemma 0] leads to that for any i £ F, 

' H(Si.\W {EtF} ,S^) 

< H(S F \W {EtF} , S G ) 

= H(S t T \W t ,W {{E , F)V} ,S l G ) 

<iJ(^|^,^ F)V} ,^) 

= H(Sf\Wi, «S'{(£; i F,G)\i}) 

= 0 , 


( 31 ) 
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from which we derive H(S F | W{e,f}, Sq) = 0. 

Still by the Property 2 in |JiJ, it has to be that H(Sq\We, Wf) = H(Sq), since \E U F U G| = k. 
Hence, we obtain the proof. In addition, the above deduction is obviously applicable to the situation 
when E = 0. 

Remark 4 Combining this lemma with Lemma [3J we have 

flW <(k-h-h)a- H{Sg). (32) 

In particular, for the linear MSR codes, this upper bound can always be achieved (as in Remark [§. 
Moreover, the equation H(S F \W F ) = H(Sq) promotes the next result. 


Lemma 6. In the MSR scenario with {n = d + 1, k, d, a , (3 }, for any subset F such that li 7 ) < k — 1, and 
arbitrary different i\,i 2 where i\,i 2 ft. F, we have H(S F ) = H(S F ). 

Proof. According to Lemma 0 we obtain 

' H(S f ) 

= H(S f ,W f ) 

< = H(W f ) + H(S f \W f ) (33) 

= H(W F ) + H(S%,\W f ) 
k =H(W f ) + H(S f ,), 


where G' is any subset of [l,d + 1] such that |G'| + |F| = A; and G" fl F = 0. Based on the condition 
|F| < k — 1, then it has to be that \G'\ > 1. 

1. When |G'| = 1, for any two different gi and <72 where < 71 , <72 G {[1, d + 1] \ F}, 

H(S F ) = H(W f ) + H(S f ) = H(W f ) + H(S f ), (34) 

which indicates H(S F f) = H(S F 2 ). 

2. When |G'| > 2, we set G' = {g',Gi} and G" = {g",Gi} such that {g' 7 ^ g", \G'\ = |G"| = 
k — |F|, G' fl F = G" fl F = 0}, where G" plays the same role of G' in the following statement. Similarly, 
we derive 

' H(S f ) 

= H(W f ) + H(S&) 

= H(W F ) + H(S F ) + H(S£ i y, 

p, (35 

H(S f ) 

= H(W f ) + h(s£„) 
k =H(W F ) + H(S F ,) + H(S F 1 ), 

which implies H(S F ,) = H{S F „). 

Because the choice of (< 71 , ^ 2 ) and (g ', g") are arbitrary, then for arbitrary different i\, ii where i\, *2 ^ 
F, we have H(S F ) = H(S F ). 


Remark 5 Lemma\^ shows that the entropy of repair data from any two nodes assisting in repairing the 
same subsets of nodes are identical. Combining this lemma with Remark [^} we further obtain the following 
result on upper bound. 


12 






3.2 A Simple Expression of Upper Bound 


Incorporating Lemma[3j Lemma[5]and Lemma| 6 l we consequently derive a simple and generally applicable 
result on secrecy capacity as follows. 

Theorem 1. In the scenario of MSR codes with parameter set {n = d + 1 ,k,d,a, ft}, we have 

B (s) <{k-h-l 2 )(a-H(S£)), (36) 

where g £ G, |G| = k — l\ — l 2 , and |F| = l 2 . 

Remark 6 This can be viewed as a simple and generally applicable upper bound of B^ s \ since we only 
need to calculate or estimate the joint entropy of repair data transmitted from any single node H(Sg). 
Still by Remark [H this upper bound can be reached in the scenario of linear MSR codes. 


3.3 Stable MSR Codes 

Given an MSR code with {n = d + 1, k, d , a , /?}, since our focus is the exact repair, the random variables 
Wj are invariant with time, i.e., they remain constant irrespective of the sequence of failures and repairs 
that occur in the storage system. Once construction of such an MSR code with {n = d + 1} is present, 
content of Sf sent from a node i to repair another node j also keeps invariant. However, for the MSR 
code with {n > d+ 1, k, d, a, /?}, the repair data Sf technically need not keep constant and may vary with 
different sets of helper nodes including the same node i, only needing to satisfy that per node storage 
Wj stays unchanged. For instance, when node j is failed, node i is assigned to assist in repairing node j. 
Thus, there totally exists ( n Z_\) possible sets of helper nodes including node i. 

Assume repair data of node j is captured by the eavesdropped If content of repair data Sj is not 
independent of the choice of the set of helper nodes and varies with them, after multiple repair epochs 
with different sets of helper nodes including node i, different information regarding repair data Sf will be 
exposed to the eavesdropper. Thus, the eavesdropper is supposed to observe more information regarding 
repair data of node j, when compared to the case of invariant content of repair data. In the following, we 
will use an example to illustrate this security issue. 


Example 1 Assume E = 0 and F = {1}, i.e., only the repair data of node 1 is eavesdropped. Consider 
two truncated MSR codes M and M' comprised of nodes set [l,d + l] and [1, 3, • - - ,d+ 2] respectively from 
an MSR code with {n > d + 1, k, d, a, /3}. 

Since they still are MSR codes, they necessarily retain the properties in Section 3.1. Thus, we have 

j H(S 1 ( M)) = H{Wx) + {k- l)H(S\{ M)) =a + (k- 1)0 = dfi 
( H[S 1 (M')) = H{W\) + (k — 1)H (^(M')) =a + (k-l)0 = d0, 

where and R(5 1 (M')) respectively represent the repair data of node 1 under different con¬ 
texts of truncated MSR codes. Besides, it follows from Lemma 0 that ( M)) = ^(^(M)) and 

H(S} 2 { M')) = H(S^(M')) for any h e M and i 2 £ M'. 


3 It would be reasonable to assume here that the identity of node j can be recognized by eavesdropper, although 
node j when failed will be replaced by newcomer nodes. In this case, the eavesdropper will gain access to all 
repair data via sitting on the same node j undergoing different repair epochs. 
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Furthermore, similar to the deduction of properties given in Section 3.1, we derive 
' H(S\M),S 1 {M')) 

= H(W 1 ,S 1 {M),S 1 (M')) 

= H{W\) + H(S\M),S 1 (M')\W 1 ) 

= ff(Wi) + H(S( W] ( M),4 jd+2] (M')|Wi) 

= H(W,) + ff(4, fc+ 1 ] (M,M')|Wi) + F(4 ife+ 2 i ... id+ 1 ] (M),5f fc+ 2 id+ 2 ] (M0|W 1 ,4 ife+ 1 ] (M,M')) 

= ff(W 1 ) + J ff(5 [ 1 3ife+1] (M,M')|Wi) 

, = jy(Wi) + (fc-l)fl r (S|(M,M')), 

( 3g ) 

where H(S^ 2 k+2 ,,, id+ i](M), S'[ 1 fc+2jd+2 ] (M')|Wi, = 0 results from Lemma\4\ 

If S^M) does not share the same information with it has to be that H (^(M,M')) > f3, 

which leads to that H M')) > d/3. It means that eavesdropper will inevitably obtain different data 
information after multiple repair epochs with different sets of helper nodes. When traversing all possible 
truncated MSR codes corresponding to repair epochs with all possible sets of helper nodes, it even may 
render the storage system unable to maintain any data secrecy. 

Based on the above security concern, we define a special MSR code as follows. 

Definition 7. (Stable MSR Code): A stable MSR code with {n > d+ l,k,d,a,(3} is an MSR. code with 
the “stable ” repair property, i.e., the data transmitted from any node i to repair node j is independent of 
the set of helper nodes including the same node i. In other words, content of Sj remains invariant under 
different sets of helper nodes including the same node i. 

One can check that the product-matrix-based MSR code m is a stable MSR code. The following 
theorem will show that this stable property in fact is the equivalent condition of secrecy capacity between 
any MSR code with {n > d + 1, k, d, a, /3} and its truncated one with {n = d + 1, k, d, a, f3}. 

Lemma 7. Let N be a stable MSR code with the parameter set {n > d+ 1 , k, d, a, /3} and N 7 be the stable 
MSR code with {n = d + 1, k,d,a, /3} truncated from N, then the secrecy capacity of N is same as that of 
N'. 

Proof. Without loss of generality, assume N' is comprised of the nodes set [l,d+ 1] truncated from N. 
We set the same subsets E,F,G for N and N', where E,F,G are three disjoint subsets of [l,d+ 1] as 
defined in section 2.4. 

Lemma [3] indicates, for any secure regenerating code with {n = d + 1, k, d, a, /?}, 

B (s) <H(W g \We,W f )-H(S f \We,Wf). (39) 

Although this universal upper bound is established on regenerating codes with length equaling to {d+1}, 
it actually is also applicable to those extended regenerating codes with {n > d + 1} since they still have 
the reconstruction and regeneration properties. Nevertheless, in order to avoid confusion, we let S F ( N) 
and 5' F (N / ) respectively represent the repair data of the nodes set F under the contexts of N and N'. 
Accordingly, we have 

f BW(N) <H(W G \W E ,W F )-H(S F mW E ,W F ) , x 

< (40) 

( bW(N') < H(Wg\We,Wf) - H(S f (N’)\We, W f ), 

with which we only need to prove that H (S ,F (N)|M / £', Wf ) = H (S' F (N , )|M / £;, Wf) ■ Since N and N' both 
are stable MSR codes, we can unambiguously substitute S’ F (N) = ££, and S' F (N') = S) F d+1 i. Thus, it 
follows by showing that 

H(S^ n] \W E ,W F ) = H(Sf hd+1] \W E ,W F ), (41) 
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with which it is sufficient to prove that H(SE n i\SE d+1 ,) = 0. To this end, it is equivalent to prove that 

tf(4 ; n]l 5 [M+i]) = H ( S U + 2 ,n]\Sl d+1] ) = 0 for any i e F. 

Without any loss of generality, we consider the situation when i = 1. Thus, we have 


^{^[d+2,n] l‘S , [l,d+l]) 

= H (S{ d+2n ]\S^2 tC i+i]) 

= H(S [ d+2 n \\Wi, S^ 2 t d+i]) 
. <H{S\ d+ ^ n] \W u Sl 2M ). 


1. When n — d — 1 > d — k + 1, we set Q is any subset of [d + 2 ,n\ of size d — k + 1. Because 

N’s any truncated code with {d + 1 ,k,d,a,0} still is an MSR code, the nodes set {[2 ,k\ U Q} can be 
viewed as a truncated MSR code. Due to the second term of equation (1231) in Lemma [I] we further derive 
H(Sq\Wi, Sp fc j) = 0. Since Q is a random subset of [d+ 2, n], it is obvious that H(S^ d+2 n j \Wi, S^ 2 = 0. 

2. When n—d— 1 < d—k+1, we set Q is any d—k+ 1-sized set such that [d+2,n] C Q and [1, k]HQ = 0. 

Similarly, we have H{Sq\W\, S} 2 = 0, from which we can also derive H(S^ d+2 , | W\ , Sh fc j) = 0. 

Combined with formula fl^l) . both cases imply that H(S^ d+2 „i|S'j L 1 d+ii) = 0- 

Remark 7 Lemma\7\ indicates that secrecy capacity of stable MSR codes does not depend on the param¬ 
eter n but the remaining parameters {k, d, a, f3, B}. One example of stable MSR codes with {n > d+1} is 
the product-matrix-based MSR code given by Rashmi et al m- In another aspect, it is an interesting ques¬ 
tion to design an MSR code with unstable property. However, for any unstable MSR code with {?r > d+1}, 
its secrecy capacity is strictly less than that of the corresponding truncated one with {n = d+1}, as shown 
in Example |7J Thus, this stable property is highly advantageous in constructing secure MSR codes. 

Note 2 In subsequent discussion, we focus on the secrecy capacity of linear MSR codes with {n = d + 
1 ,k,d,a,f)}. 

4 SECRECY CAPACITY OF LINEAR MSR CODES 


In this section, we will give a comprehensive and explicit result on secrecy capacity for linear MSR 
codes with {n = d + 1, k, d, a, f3}, which is divided into two categories. In the first category, the secrecy 
capacity is fully characterized, which applies to all linear scalar MSR codes, i.e., S = 1. In the second 
category, upper bounds on secrecy capacity are present, which apply to all known vector codes with 
{f3 = (d— k+ l) x } where x > 1 such as Zigzag code [53]. Furthermore, these two categories will be shown 
to also apply to those unexplored linear vector MSR codes with {1 < j3 < d — k + 1}. Before these, we 
first give a lemma that will be used in the subsequent proofs. 

Lemma 8. Given any regenerating code with {n = d+ 1, k, d , a, /3}, for any set J = (ji,j 2 , • • ■ , j m ) C 
[1,4+1], we have 

H(S J ) = h (S^ 14+ i ^ S {[i,d+i]\(j u j 2 ,- ( 43 ) 

Proof. The proof can be obtained from two directions. 

First, it is clear that 


H ( S {[i,d+i\\jiV 
< H(S J ). 


CJ2 


,si 


{[l++l]\(fl , 32 ,— 


(44) 
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Second, we can deduce that 


-a+U\Ui 


jff ( s '{[i,d+i]\i 1 }’ 5 '{[i, 

--V-' 

= H( < S {[i,d+i]\j 1 p S {[i,d+i]\Ui,h)V " ■ ’ ^{[i.d+iIVOi^,- 

_ prion W■ Q& 2 ’"' dm} ... cim 

- "^{[l.d+llVx}’ VVjl ’ J il ’ °{[l>rf+l]\0lj 2 )}’ ’ °{[l,rf+l]\0lj2,-” ,jr, 

^ ---' 

= ^^{[M+ilYh}’ W n > 5 {[i,d+i]\j2}> ' ’* S {[M+i]\OV" ,j m )} 


.)>> 


.) 


“ ^(^{[l.d+l]^!}’ ^1’ ^{[M+lJYjs}’ ^2’ ‘ ' ' > S {[l,d+l]\j m }’ W jm) 

= 

where the formulas in the braces follow from that, for any l E [l,m], 

f ^(5f [x ,d+l]\ji} ) = H(S? [1 ,d+l]\j'i}’ ^ji) 

1 J?(sJ' ,+1, '''’ , ' m} ,w ?I ) = JT(^ l ). 


( 45 ) 


(46) 


Remark 8 TTiis lemma indicates that there exist much dependence among repair data of multiple sets 
of nodes. With it, we can reduce the amount of helper nodes for some failed nodes. Thereby, we can 
derivatively obtain 

J 3 '(' S '?[1,«H- 1 ] \0- 1 ,j 2 ,- ,j TO ) } l' Sr{j " 1 * J " 2 ’"' ^= tf( s{jl ’ J2 ’"' ,Jm} ) - (47) 

which will be used in the proofs later. 


4.1 Category 1: Precise Value of Secrecy Capacity 

Here, we will give the precise value of secrecy capacity for linear MSR codes with {1 < ft < d f 2 k ^ 1 1 } and 
as a result prove the optimality of the secure product-matrix-based MSR codes given in m- 


4.1.1 The situation when (3 = 1 . 

Theorem 2. In the linear MSR scenario, for any subsets P and T with {|P| = k , |T| = d— k + 1, Pf)T = 
0}, any F such that F C T and |P| < k — 1, and arbitrary i F, we have H(Sf) = \F\(3 = |P| when 

13 = 1. 


Proof. Assume P = [1, k] and T = [A; +1, d + 1]. Without loss of generality, also assume F = [k +1, k + c], 
where c > 2 (as it is trivial when c = 1). In the linear MSR scenario, Lemma [2] indicates that H(Sf) has 
to be an integer for any subset A C F and i ^ A. 

By proof of contradiction, under the condition (3 = 1, we assume c is the smallest value satisfying 
that H(S^ +1 ' k+c ^) = ^(gj fc +:L,fc+c]) __ Based on Lemma EH we know that for any i [k + 

1 ,k + c], it must be that }j(g^ k+1 ’ k+c ~ 1 ^ — jj^g[ k + 1 > k + c }^ _ ( c _ from which we further derive 
H(S k+c \sl k+1 ’ k+c ~^) = 0. Then, following from LemmaHl we have that for any j E [1, d — k + 1], 


f H(S [k+1 ’ k+j] ) = H{S k +^ 


cfc+2 

[l,fc]U[AH-2,d+l] ’ °[l,fe]U[fe+3,d+l] 5 


,s: 


k+j 


H(S k+J 


[l,fc]U[fc+j + M+l] 


[l,fc]U[fe+j+l,d+l] 
|^[fc+i,fc+i-i] j _ jj^g[k+l,k+j]\ _ jj rg[k+i,k+j-i}\ 


(48) 


16 







| qk+1 


qk-\- 2 

•’[l,fc]U[fc+c+l,d+l] l‘ J [l,fc]U[fc+2,d+l]> 0 [l,k]U[k+3,d+l] > ' 


qk+c —1 \ 

1 0 [l,fc]U[fc+c,<2+l]' 


In one way, since we have H(S k+c \S\ k+1 ’ k+c ^) = 0 for any i ^ [fc+1, fc+c] from the above assumption, 
we have 

TT(Qk+C I o[fe+l,fc+c—1] \ 

^ V°[l,fc]u[fe-t-c+l,d+ll 1° > 

= H(S^ } 

< H(S^ } 

. =o. 

In another way, we derive 

TT ( ak+C I nr[fc+l,fc+c—1]\ 

n v°[l.fclurfc+c+l.d+ll 1° ) 


15 , 


[fc+1,fc+c-1] 


'[l,fc]U[fc+c+l,d+l] l‘ J [l,fc]U[fc+c+l,d+l] 


l) 


[l,fc]U[/c+c+l,(2+l] I 

_ ^g[k+l,k+c] ^ _ H ^ s [k+l,k+c-l]\ 

= {H(W[k + i,fe+ c ]) + -^('S'g' + ’ + ^)| — {H(W[k+i,k+c-i}) 

= {ca + (k — c)(c — l)j3} — {(c — l)a + {k — c + l)(c — 1 )/3} 
= a — (c — l)/3 
= (d — k — c + 2 )/3, 


H ^ s [k+l,k+c-l]^} 


where 


(49) 


(50) 


(51) 


f H{s [k+ i,fc+c]) = H(W [k+hk+c] ) + H(S& +1 ’ k+c] ) 

{ H{S [k+l,k+ c-1]) = H(W [k+hk+c -i]) + H(S [ £t ltk+e ~ 1] ) 

result from Lemma [5] and G', G" are defined as in Lemma [G] with \G'\ = k — c and \G"\ = k — c + 1. 

Now, we are to make comparison between equation (1491) and (1501) . when c < min{d — k + 1, k — 1}. 
Equation dSHD is a monotone decreasing function in the variable c, thus there are two cases as follows. 

1. If d — k + 1 > k — 1, when c = k — 1, equation (15U1) takes minimum value {d — 2k + 3}/3 that is 
strictly greater than 0 . 

2.1fd — fc+1 < k — 1. when c = d— k +1. equation (15P1) reaches minimum value /3 that is still positive. 


To this end, both cases indicate that equation (15P1) contradicts formula (Hill) , when c < min{d—fc+ 1 , fe¬ 
ll, i.e., the assumption that -ff(S'j fc+1 ’ fc+c 1 lj = H(s\ k+1 ' k+c l) cannot hold. In other words, there does 
not exist such value c that = H(S\ k+1 ’ k+c ^), when (3 = 1 and c < min{d — fc + 1 , k — 1}. 

Therefore, we can claim that, for any F such that F CT and |F| < fe — 1, H(Sf) = H(S\ k+1 ' k+c ^) = c/3. 


Corollary 1 . In the linear MSR scenario, when [3 = 1, we have 

B (s) = (fe -h- l 2 )(a - hP), (52) 

where l\ + I 2 < fc — 1 and I 2 < d — fe + 1 . 

Proof. Remark [G] implies that £b s ) = (fe — l\ — l 2 )(a — H(Sg)) in the linear MSR scenario, where g F. 
Combining it with Theorem [2] we obtain this corollary. 

Corollary 2. The product-matrix-hased secure MSR code given in m is optimal for any l\ +12 < fe — 1 
and I 2 < d — fc + 1 . 

Proof. First, the product-matrix-based MSR codes constructed in m is established on [3 = 1 and is a 
stable MSR code as stated in Remark [TJ Then, according to the construction of secure MSR codes in m, 
the (Zi , ^)-secure MSR code achieves 

B (8) = (fe -h- l 2 ){a - l 2 P). (53) 

Thus, the secrecy capacity of secure MSR codes in m exactly complies with that given in Corollary [TJ 
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Remark 9 Actually, Corollary\J]is applicable to all linear scalar MSR codes, i.e., linear MSR codes with 
(3 = 1. In other words, by MRD code’s pre-coding as stated in Remark [H all linear scalar secure MSR 
codes can offer this secrecy capacity with precise value given in Corollary U\ 


4.1.2 The situations when 1 < 3 < d , fc ~ t T 1 ■ 

Theorem 3. In the linear MSR scenario, for any subsets P and T where {|P| = k, |T| = d—k+l, PC\T = 
0}, any F such that F C T and |F| < k — 1, and arbitrary i F, when [3 < ferjj- or |P| < 1 + d ~ k + l , 
we have H(S[) = \F\/3 where (3 > 1. 


Proof. Similar to Theorem [2J we assume P = [1 ,k],T = [k + 1, d + 1] and F = [k + 1, k + c] where c > 2. 

By proof of contradiction, we assume c is the smallest value such that R(5[ fe+1 ’ fe+c_1 ^) = (c — l)/3 
and R’( 6 '| fc+1 ’ fc+c ^) = (c — l)/3 + 6, where 9 £ [0, /3 — 1] and 9 must be an integer following from Lemma 
[2j when (3 > 1. From LemmaEO we know ft(S'] fc+1,fc+c_1 ]) = (c — 1)(3 and R(S'] fe+1 ’ fe+ ^) = (c — l)/3 + 9, 
where 9 £ T, fl [0, (3 — 1] for any i ^ [k + 1, k + c], from which we further have H (S k+c \S\ k+1 ’ k+c ~^) = 9. 

Due to the similar way of Lemma [5] used in the proof of Theorem [2] we first have 


' JJ(Q k + c I c[fc+l,fc+e—1] \ 

_ ^ _ JJ^g[k+l,k+c— 1 ]\ 


< = {H(W [w+c] ) + P(4 fc , +1 ’ fc+c1 )} - {H(W[k+ 1 ,k+ c -i]) + F(4 fe + 1 ’ fe+c - 1] )} 
= {ca + (k — c)[(c — l)/3 + 0]} — {(c — l)a + (k — c + l)(c - 1 )(3} 

= (d — k — c+ 2)f3 + (k — c)9. 


(54) 


In another way, we obtain 


' u(qk+c | c[fc+r,fc+c—r] n 

n V D [i,fc]u[fc+c+M+i]l J > 

= H(S k+c \S [k+1 ’ k+c ~ 1] ) + H(S k+c \S [k+ 1 ’ k+c ~ 1 ] ,S k+c ) + • • • + H(S k +^\S [k+ 1 ’ k+c - 1 ] ,S k+ ^ 


< H{S^ +c \S [k+1 ’ k+c ~ 1] ) + H(S k+c \S [k+1 ’ k+c ~ 1] ) + ■ ■ ■ + H(S k +i\S d+1 


J d+1\ 
k-\-c | —1] 


[l,fc]U[fc+c+l,<2] ' 


= (d — c + 1)9. 

(55) 

Thus, if (d — c + 1)0 < (d — k — c + 2)(3 + (k — c)9, i.e., (d + 1 — k)9 < (d — k — c + 2)/3, contradiction 
arises. Particularly, when 0 = 1 3 — 1, (d +1 — k)9 reaches maximum (d + 1 — k)(/3 — 1). By simplification, 
we obtain that, when [3 < d f. k ^ 1 or c < 1 + d ~ k+1 , equations (l5ill contradicts formula (l55l) . which means 

the assumption that ij(5j fc+1 ’ fc+c-1 l) = (c — l)/3 and Id(S' : [ fc+1 ’ fc+c ]) = ( c — 1 )/3 + 9 cannot hold, when 
9 £ Z fl [0, (3 — 1]. That is to say, the value of 9 here can only be exactly taken by /3. 

Therefore, for any F such that F C T and \F\ < k — 1, when (3 < d f k ^ 1 or c < 1 + d-k+i , we j iave 

H(Sf) = H(S\ k+ 1,fe+c] ) = c/3. 


Corollary 3. In the linear MSR scenario, when f3 > 1, we still have 


R (s) = (k-h-l 2 )(a-l2p), 


(56) 


when li + l 2 < k — 1 and l 2 < 1 + d ^ +1 or (3 < d l2 k j 1 1 ■ 

Proof. Combining Theorem [2] and Theorem [3l this corollary can be derived as Corollary [l] 


Remark 10 In this category, achievablity can be attributed to that H{Sg) exactly reaches the maximal 
value l 2 /3, when l 2 < 1 + d ~ k+1 _ J n other words, there does not exist the intersection pattern (dependence) 
within Sg in this category, i.e., all repair data included in S J are mutually independent. However, 
sometimes H(Sg) cannot be exactly calculated and only can be estimated, which will be shown next. 
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4.2 Category 2: Upper Bounds on Secrecy Capacity 


In the other situations when (3 > 1 , we cannot exactly calculate the value of H(S F ). Instead, we 

can only estimate the range of value that H(S F ) can be taken from. 


4.2.1 The situations when Z 2 = t + 1, d ^~*~ 1 < (3 < . 

Theorem 4. Given a linear MSR code, for li +12 < k — 1 and Z 2 = t + 1, when d ~ k + l < f3 < d ~k+i . we 
have 

= (k-h-l2){a-n(/3,l 2 )), (57) 

where 7 r(/3, Z 2 ) = H(S F ) > t/3 + d f k f!//[ 1 l3 and t < d — A; + 1. 

Proof. It basically follows from formulas (l5lll and (l55l) in Theorem [3] 

When d ~^ +1 < /3 < , Corollary [3] leads to that 7 r(/3, t) = t/3. According to the proof of Theorem 

[3l for any i £ [k + 1 , k + t + 1 ], we have 


J?(Sj fc+1,fc+t] ) = t/3 
R(Sf +1,fc+t+1] ) =t/3 + 0, 


(58) 


where 0 e Zfl[0, /3\. Because /3 > d ~ k+1 ; when setting 9 = /3 — 1, we have (d+1 — fc)(/3 — 1) > (d—Zc—1+1)^, 
from which we cannot obtain contradiction by formula ( 1541 ) and ( 1551 ) . With them, we can only derive that 
d ~k -^+ 1 /3 < 9 < f3. Thus, we obtain 


r b ( s) 

I = (fc-Zr - Z 2 )(a - 7r(/3, Z 2 )) 

[ < (fc - Zi - Z 2 ){a - (t/3 + 1 ^)}» 


where the equation of the first step follows from Remark [ 6 ] and the inequality in the second step results 
from that 7 r(/ 3 , Z 2 ) = t/3 + 9 > t/3 + /3. 


Remark 11 Our focus in this paper is studying the secrecy capacity of MSR codes that can efficiently 
repair all nodes under the eavesdropper model with F £ [ 1, d+1]. Unlike Category 1, tightness of the bounds 
in Theorem stays unclear. In )S3$ , the authors considered using Zigzag code I23\j to construct secure 
MSR code that can attain the upper bound B^ < (k — l\ — Z 2 )(a — (2/3 — -^r)), where a = (n — k) k and 
|U| = Z 2 = 2. However, Zigzag code HTSf is a systematic MSR code allowing efficient repair of systematic 
nodes only and the secure Zigzag code designed in fS'Sf is established on the premise that the eavesdropper 
gains access to the repair data o/Z 2 systematic nodes, i.e., F £ [1, fe]. 

Nevertheless, the simple and generally applicable upper bound B^ < (k — l\ — Z 2 )(a — H(Sg)) given in 
our Theorem.\T\in fact also applies to systematic MSR codes, only requiring that F £ [1, k\. First, it is clear 
that the universal upper bound on secrecy capacity for any regenerating code B W < H(Wg\We, Wf) — 
H(S f \We , Wf) in Lemmaff&is applicable to systematic MSR codes, since they still have the reconstruction 
property and regeneration property of systematic nodes [l,fc]. Further due to their minimum storage 
feature, it can be similarly derived that H{Wg\We, Wf) = H(Wg) = (k —1\ — Z 2 )a. Second, based on the 
fact that systematic MSR codes have the same parameter setting a = (d — k + l)(3, one can check that 
Lemma [7J Lemma [H and Lemma [ 6 ] all apply to systematic MSR codes as well. Thus, Theorem [I] can be 
applied to systematic MSR codes wherein F £ [l,Zc]. 

In the linear MSR scenario, although we assume F = [Zc + 1, fc + Z 2 ] in the proof of Theorem @ Theorem 
0 and Theorem [^} they actually all are applicable to linear systematic MSR codes, because there does not 
restrict F to be necessarily included in [k + l,d + 1] in their conditions. To this end, systematic MSR 
codes are supposed to formally share the same secrecy capacity with MSR codes that efficiently repair all 
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nodes. Consequently, the bound in Theorem [7] also applies to linear systematic MSR codes and actually 
is consistent with the bound given in J33j for certain situation. 

The secure Zigzag code present in is designed by MRD code’s pre-coding (Gabidulin code J37$) 
and is built on a = (n — k) k and I 2 = 2. For Zigzag codes, when a systematic node is failed, the remaining 
k — 1 systematic nodes and all the n — k parity nodes are required to participate in repair, which implies 
that d = n—1. Thus, we have that a = (d — k + l) fc and fj = (d — k + l) fc_1 > d — k + 1. According to our 
Theorem [7J we find t = 1 satisfies the condition as /3 = (d — k + l) fc—1 > d — k + 1, which results in that 
7r(/ 3 , 2) > j5 + = 2/3 — d _^ +1 ■ It exactly equals to 2/3 — ^3^, the corresponding result of Corollary 

16 given in Furthermore, Corollary 16 in is apparently included in our Theorem [7J since it is 
not only applicable to the situation t = 1. 


4.2.2 The situations when I 2 = t + e, e > 1, d ^~*~ 1 < (3 < ■ 

Theorem 5. Given a linear MSR code, for h +12 < k — 1 and I 2 = t + e, when d ~~ k+1 < /3 < d ~ t k f^ 1 , we 

have 

B (s) = (k-h-l 2 )(a-n(p,l 2 )), (60) 

where 7r(/3, 12 ) = H{Sg) >t/3 + f3(d — k — t + 1) [l — ) e ] with t < d — k + 1 and e > 1. 

Proof. Without loss of any generality, we assume the set F is [1 ,t + e\, where t + e + l 1 < k— 1. According 

to Lemma [Gj we know that for any i ^ [l,f + e], H(s\ 1 ' t+e ^) is invariant. 

Due to </3< ^±1 and Corollary [1 we have 

( H ( S {l,t+e]j 

\ = t/3 + H(Sl +1 \s\ 1,t] ) + - 1 - H(Sl +e \s\ 1,t+e ~ 1] ) ( 61 ) 

[ = t/3 + @1 + ■ • • + 9 e , 

where H (S t i +: ’ \s\ 1 ’ t+:i ~ 1 ^) = 9j and 0j G ZD [0,/3], for j G [1, e] . Still by Lemma [8l H{S^' t+e 1) can be 
expressed as 

H(S [1 ’ t+e] ) =H(Sl 24+1] ,Sl d+1] ,--- ,4+ e e+M+1] ). (62) 


First, with the method similar to the proof of Theorem [2] and [3J we have 

r «(sf, + + ' €+u+11 is |i -* + - 11 ) 

= H(S [1 ^) - H(S [1 ^ e ~ 1] ) 

= {( t + c)ol + (k — t — e) ^t/3 + + • • • + 0 e ] } — {( t + e — l)^ + (/c — £ — e + 1) [t/3 + 0i + • • • + 0 e — 1 ]} 

„ — ol — [t/3 + 6\ -f- • • • + 0 e — i] + (/c — £ — e)0 e . 

(63) 

Second, we obtain 

[i+e+l,rf+l] I 

\ < (d — t — e+ 1 )6 e 
which can be derived as inequality (l55l) . 

Then, combining equation (IMl) with (RH1) . we derive (d — k + 1 )9 e > a — [t/3 + 9\ +- V 6 e - 1 ], from 

which we further have 

(d — k — t + l)(3 9\ + • • • 4- Q e -\ . . 

(65) 


H(S^ e+1 , +1] \S^ + ^) 


(64) 


9 e > 

d-k + 1 

Through rearrangement, it can be changed to 


d — k + 1 


^ _|_0 ^ (d-fc-f + l)^ ^ (d — fc)(6 l i H-b^e-i) 


d — k + 1 


<3 — fc + 1 


( 66 ) 
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By setting <j(e) = 6\ +-b O e , we obtain 


d-k (d-k-t+l)P 

w(e) > ——— w(e-l) + 


(67) 


d — fc + 1 d — fc + 1 

From Theorem[4j we know 6h > d 3 k 7(f. 1 p. Hence, by the method of recursion and induction, we have 


w(e) > P (d — fc — i + l)[l — ( 


d — fc 
d — fc + 1 


)“]• 


( 68 ) 


To this end, we have tt(P, Z 2 ) = -ff(S'j 1,t+ ^) = tp + u>(e) >tp + p(d — fc — t + 1) [l — ( d d k > l 1 ) e ] ■ 

Remark 12 Theorem^ is the supplementary of Theorem [7J which expands the range of values that I 2 
can he taken from. In Theorem [7j e ordy ccm fee taken by 1, while Theorem\S \ takes e by any value only 
needing to satisfy li + t + e<k~l and t < d — fc + 1. As stated in R.em,ark \lll Theorem\5\ basically also 
applies to systematic MSR codes for F £ [1, fc]. 

In fact, the upper bound given in JJ2j is also a special case of our Theorem [5] Zigzag codes by 
pre-coding of MRD codes are shown to be able to achieve this bound on secrecy capacity in [32 1 Since 
a = (d + 1 — k) k , we know P = (d + 1 — fc) fc_1 > d + 1 — fc, which similarly indicates that only t = 1 
conforms the constraints on P required by our Theorem 0 Thereby, we have I 2 = e + 1, from which we 
derive 7 r(/3, 12 ) > P + P{d — fc) [l — { d d f k l f i y 2 ~ 1 ] ■ By simplification, we further have 


' n(P,l 2 ) 

= a~P{d — k){- d ^ ' l2 ~ 1 


iY 


(69) 


= a — a(l — 


’ d—k + 

1 -)* 2 , 


d — fc + 1' 


which leads to that B^ < (fc — l\ — Z 2 )(l — ^ exac ^V consistent with the bound given in 

i-e., £ (s) < (fc - h - / 2 )(1 - ^)' 2 a. 

Although some upper bounds (limited to t = 1 or P > d — k + 1) in this category are achievable 
for the Zigzag codes considered in 132X33^ . they are not generally achievable for all other MSR codes 
with {P > d — k + 1}. For example, for those vector MSR codes with {p > d — fc + 1} designed by 
concatenating m same scalar MSR codes where m > d — fc + 1, their secrecy capacity is exactly equal 
to (fc — li — / 2 )(a — I 2 P) following from Corollary Q] where p = m, since each scalar MSR code within 
a vector MSR code shares the same code construction and can be designed to be mutually independent. 
It is obviou^f | that (fc — h — Z 2 )(a — hP) < (fc — h — h)( 1 — ) i2 a, which means that those vector 

MSR codes cannot reach the bounds in Theorem 0 Therefore, unlike Category 1, the value of H(Sg) or 
tt(/3,/ 2 ) in Category 2 cannot be determined, i.e. its precise value may vary with different MSR codes’ 
constructions. 


4.3 Putting Together 


Now combining the two categorizes on secrecy capacity of linear MSR codes, we give the following 
comprehensive and explicit result on secrecy capacity for any linear MSR codes with {n = d + 1}. 


In the category where I 2 = t+e and d k+1 < P < d t ^j~ 1 , we have < (fc —^ —/ 2 ){a — — /?(d —fc —t-bl) [l — 
( d-k +1 ) E ] }• Through analysis, one can check the term /3(d — fc — t + 1) [l — ( d ^^. 1 ) e ] < P(d — k — t + 1) [e(l — 
d ~ k [)] = ef{ 1 - d _l +1 ) = ef - d f^ +1 < eP - e < eP. So, we derive tp + P{d - k - t + 1) [l - ( ^j^ ) 6 ] < 


d-k+l> 


(t + e)p = I 2 P, which leads to that (fc — h — Z 2 )(a — I 2 P) < (fc — li — h)( 1 — d _^ +1 ) 2 a when t = 1. 
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(70) 


Theorem 6. Given a linear MSR code with {n = d + 1, fc, d, a , /3}, /or l\ + I 2 < k — 1, we /lore 

= (k-h-l 2 )(a-Tr(p,h)), 

where n(/3, l 2 ) 

( = hP, if h<t,f)<^^-, 

\>tp + p(d-k-t + l)[l-(a^T) e ],^ Z 2 = * + e,^</?<^±i, 

where 1 < t < d — fc + 1 and e > 1. 


(71) 


Remark 13 /n </ie literature, the known linear MSR codes are comprised of the scalar MSR codes 
with {ft = 1} and the vector MSR codes with {ft = (d — fc + l) x } where x > 1 

/i should be noted that these vector MSR codes are not designed from con¬ 
catenation of scalar MSR codes. Similar to Zigzag codes {23j/, they share the same intersection pattern, 
i.e. there exist the same dependence within disjoint sets of repair data transmitted from any one node 
(e.g. Sg where g (£ F). Thus, the second item in formula also applies to them. 


4.4 Further Discussions 

Theorem [6] exhibits a comprehensive and explicit result on secrecy capacity for any linear MSR code 
given the parameter set {n = d + 1 ,k,d,a,/3} and the (l±, ^-eavesdropper model. In retrospect, all 
constructions of linear MSR codes are based on the scalar case ft = 1 or partial vector cases where ft is 
required to be exponential in d—k + 1. Thus, designing linear vector MSR codes with {1 < ft < d — k+l} 
by no concatenation remains open. Nevertheless, our Theorem [G] still presents certain results on secrecy 
capacity for these unexplored MSR codes. Thereby, we put forward two questions as follows. 

Question 1. Do there exist the linear MSR codes with {1 < ft < d — k + 1} by no concatenation? If so, 
how can we construct them? 

Question 2. Given such a construction with {1 <(t<d-h + 1}, is it achievable for the bounds given in 
formula (El) when l 2 > 1 + d ~*j +1 ? 

Remark 14 According to formula j71\ ), for the linear MSR codes with {1 < ft < d — k + 1} by no 
concatenation, when l 2 < 1 + ' jf 1-1 , it must be that 

= {k - h - l 2 ){a - l 2 l3). (72) 

However, when l 2 > 1 + rf ~^ +1 , formula (ZZF leads to that 

B (s) < (fc - h - l 2 ) (a - tft - /3(d - k - t + 1) [1 - t ^ ) 12 ' 1 ]), (73) 

where t = d ~ k+1 if ft divides d — k + 1, and t = [1 + d ~ k+1 j if ft does not divide d — k+1. Hence, as in 
Question 2, we ask whether it is achievable for the upper bound |7(?[ ) given such a code. 

Overall, Theorem^ predicts certain results on secrecy capacity for these unexplored MSR codes, which 
consist of the precise value of secrecy capacity m and the upper bound on secrecy capacity m- 


5 CONCLUSION 

In this paper, we carry out research on the secrecy capacity of MSR codes. We assume the passive 
adversarial model where the eavesdropper can observe the contents of certain nodes and the repair data 
of some other nodes. Although the secrecy capacity of MBR codes has been characterized completely [3D] , 
it is a challenging task to analyze the secrecy capacity of MSR codes [31132133] . The additional difficulty 
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comes from the fact that the amount of data transmitted for a failed node in MSR codes, is not entirely 
stored on the node undergoing repair, making it challenging to compute the joint entropy of the repair 
data. With such a system model, we focus on investigating the repair data in the MSR scenario from the 
information-theoretic perspective. 

We first obtain some information-theoretic properties and some upper bounds on secrecy capacity for 
general MSR codes, in addition to which we introduce a new concept named by stable MSR codes. For 
the unstable MSR codes, we assume the eavesdropper could identify the nodes with repair data captured 
and demonstrate that its secrecy capacity is strictly less than that of stable MSR code. In the linear MSR 
scenario, we utilize permutation polynomial and orthogonal system in finite fields to explain the fact that 
entropy of multiple sets of repair data is an integer and ultimately derive a comprehensive and explicit 
result on secrecy capacity which closely depends on the value of /3. This outcome not only explains and 
generalizes the previous results in |31l32l33j , but also predicts certain results for some unexplored linear 
MSR codes. After that, we put forward two related questions. On the other hand, we find that all of 
these results also apply to systematic MSR codes with repair data of systematic nodes captured. 
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